Leap-based Content Defined Chunking - Theory and Implementation

نویسندگان

  • Chuanshuai Yu
  • Chengwei Zhang
  • Yiping Mao
  • Fulu Li
چکیده

Content Defined Chunking (CDC) is an important component in data deduplication, which affects both the deduplication ratio as well as deduplication performance. The sliding-window-based CDC algorithm and its variants have been the most popular CDC algorithms for the last 15 years. However, their performance is limited in certain application scenarios since they have to slide byte by byte. The authors present a leap-based CDC algorithm which provides significant improvement in deduplication performance without compromising the deduplication ratio. Compared to the sliding-window-based CDC algorithm, the new algorithm enables up to two-fold improvement in performance. Keywords—deduplication; content defined chunking; judgment function; secondary condition

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Survey of Research on Chunking Techniques

The explosive growth of data produced by different devices and applications has contributed to the abundance of big data. To process such amounts of data efficiently, strategies such as De-duplication has been employed. Among the three different levels of de-duplication named as file level, block level and chunk level, De-duplication at chunk level also known as byte level is the most popular a...

متن کامل

Two Stage Max Gain Content Defined Chunking for De- duplication

––Data de-duplication is a very simple concept with very smart technology associated in it. The data blocks are stored only once, de-duplication systems decrease storage consumption by identifying distinct chunks of data with identical content. They then store a single copy of the chunk along with metadata about how to reconstruct the original files from the chunks, this takes up the less stora...

متن کامل

Accelerating Data Deduplication by Exploiting Pipelining and Parallelism with Multicore or Manycore Processors

As the amount of the digital data grows explosively, Data deduplication has gained increasing attention for its space-efficient functionality that not only reduces the storage space requirement by eliminating duplicate data but also minimizes the transmission of redundant data in data-intensive storage systems. Most existing state-ofthe-art deduplication methods remove redundant data at either ...

متن کامل

Implementation of a File System with Encryption and De-duplication

With the rapid advance of society, especially the development of computer technology, network technology and information technology, there is an increasing demand for systems that can provide secure data storage in a cost-effective manner. In this paper, we propose a prototype file system called EDFS (Encryption and De-duplication File System), which provides both data security and space effici...

متن کامل

Leap Zagreb indices of trees and unicyclic graphs

By d(v|G) and d_2(v|G) are denoted the number of first and second neighborsof the vertex v of the graph G. The first, second, and third leap Zagreb indicesof G are defined asLM_1(G) = sum_{v in V(G)} d_2(v|G)^2, LM_2(G) = sum_{uv in E(G)} d_2(u|G) d_2(v|G),and LM_3(G) = sum_{v in V(G)} d(v|G) d_2(v|G), respectively. In this paper, we generalizethe results of Naji et al. [Commun. Combin. Optim. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015